Overfitting and Neural Networks: Conjugate Gradient and Backpropagation
نویسندگان
چکیده
Methods for controlling the bias/variance tradeoff typically assume that overfitting or overtraining is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), network size, or the amount of weight decay are commonly used to control the bias/variance tradeoff. However, the degree of overfitting can vary significantly throughout the input space of the model. We show that overselection of the degrees of freedom for an MLP trained with backpropagation can improve the approximation in regions of underfitting, while not significantly overfitting in other regions. This can be a significant advantage over other models. Furthermore, we show that “better” learning algorithms such as conjugate gradient can in fact lead to worse generalization, because they can be more prone to creating varying degrees of overfitting in different regions of the input space. While experimental results cannot cover all practical situations, our results do help to explain common behavior that does not agree with theoretical expectations. Our results suggest one important reason for the relative success of MLPs, bring into question common beliefs about neural network training regarding training algorithms, overfitting, and optimal network size, suggest alternate guidelines for practical use (in terms of the training algorithm and network size selection), and help to direct future work (e.g. regarding the importance of the MLP/BP training bias, the possibility of worse performance for “better” training algorithms, local “smoothness” criteria, and further investigation of localized overfitting).
منابع مشابه
Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کاملBackpropagation Learning for Multi-layer Feed-forward Neural Networks Using the Conjugate Gradient Method. Ieee Transactions on Neural Networks, 1991. [31] M. F. Mller. a Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Technical Report Pb-339
متن کامل
Applying Variants of Minimal Effort Backpropagation (meProp) on Feedforward Neural Networks
Neural network training can often be slow, with the majority of training time spent on backpropagation. In July of this year, Wang et al. (2017) devised a technique called minimal effort backpropagation (meProp), which reduces the computational cost of backpropagation through neural networks by computing only the k most influential rows of the gradient for any hidden layer weight matrix. In the...
متن کاملOverfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avo...
متن کاملA conjugate gradient based method for Decision Neural Network training
Decision Neural Network is a new approach for solving multi-objective decision-making problems based on artificial neural networks. Using inaccurate evaluation data, network training has improved and the number of educational data sets has decreased. The available training method is based on the gradient decent method (BP). One of its limitations is related to its convergence speed. Therefore,...
متن کامل